Regression models were developed in R using packages rstanarm, and ggplot. Full Bayesian inference used Hamiltonian Monte Carlo No-U-Turn Sampler (NUTS) with 4 chains, 1500 iterations (50% warmup, 50% inference) and 10 degrees of freedom \((\delta=10)\) in the hazard function. Further information on calculating the hazard curve and the RMST can be found in [4].
data<-cgd0
cgd0[1:4,]
## id center random treat sex age height weight inherit steroids propylac
## 1 1 204 82888 1 2 12 147 62.0 2 2 2
## 2 2 204 82888 0 1 15 159 47.5 2 2 1
## 3 3 204 82988 1 1 19 171 72.7 1 2 1
## 4 4 204 91388 1 1 12 142 34.0 1 2 1
## hos.cat futime etime1 etime2 etime3 etime4 etime5 etime6 etime7
## 1 2 414 219 373 NA NA NA NA NA
## 2 2 439 8 26 152 241 249 322 350
## 3 2 382 NA NA NA NA NA NA NA
## 4 2 388 NA NA NA NA NA NA NA
Recurrent events and covariates for each patient were encoded as time intervals between events. [CITE THERNEU]
data2<-tmerge(
cgd0[, 1:13],
cgd0,
id = id,
tstop = futime,
infect = event(etime1),
infect = event(etime2),
infect = event(etime3),
infect = event(etime4),
infect = event(etime5),
infect = event(etime6),
infect = event(etime7)
)
data2 <- tmerge(data2, data2, id= id, enum = cumtdc(tstart))
f.null<-formula(Surv(tstart, tstop, infect) ~ 1.0)
f.full<-formula(Surv(tstart, tstop, infect) ~ treat + inherit + steroids)
## Call:
## coxph(formula = f.full, data = data2, cluster = id)
##
## n= 203, number of events= 76
##
## coef exp(coef) se(coef) robust se z Pr(>|z|)
## treat -1.0722 0.3422 0.2619 0.3118 -3.438 0.000585 ***
## inherit 0.1777 1.1944 0.2356 0.3180 0.559 0.576395
## steroids -0.7726 0.4618 0.5169 0.4687 -1.648 0.099310 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## treat 0.3422 2.9219 0.1857 0.6306
## inherit 1.1944 0.8372 0.6404 2.2278
## steroids 0.4618 2.1653 0.1843 1.1573
##
## Concordance= 0.652 (se = 0.04 )
## Likelihood ratio test= 22.49 on 3 df, p=5e-05
## Wald test = 16.81 on 3 df, p=8e-04
## Score (logrank) test = 22.7 on 3 df, p=5e-05, Robust = 10.44 p=0.02
##
## (Note: the likelihood ratio and score tests assume independence of
## observations within a cluster, the Wald and robust score tests do not).
## df AIC
## m.cox.null 0 684.2894
## m.cox.full 3 667.7961
Probability for infection event was calculated under a Bayesian survival analysis framework using a spline-based (M-spline) hazard regression model. [2] The M-Spline hazard function is defined as: \[ \begin{aligned} h(i)=\sum_{l=1}^{L} \gamma_{l}M_{l}(t;k;\delta)exp(\eta_{i}(t))\\ \end{aligned} \] Where \(h_i(t)\) is the hazard of the event for individual i with \(\eta_{i}\) time-dependent predictors at time \(t\). \(l^{th} (l=1,…,L)\) denotes the basis term for a degree \(\delta\) M-spline function evaluated at a vector of knot locations \(k=\{ k_1,…,k_J \}\) and \(\gamma_l\) denotes the \(l^{th}\) M-spline coefficient. For our example, we estimated hazard ratios (HRs) and survival probability curves between treated and untreated patients.
We start with no prior knowledge (default):
# when chains>1 r makes use of viewer
CHAINS <- 10
CORES <- 10
ITER <- 2000
SEED <- 42
# draw from the prior predictive distribution of the stan_surv survival model
prior.stan.cgd <- stan_surv(
formula = f.full,
data = data2,
basehaz = "exp",
prior_PD = TRUE,
chains = CHAINS,
cores = CORES,
iter = ITER,refresh=2000,
seed = SEED)
Let’s use more appropriate priors:
prior.stan.cgd2 <- update(prior.stan.cgd,
prior_intercept = normal(0, 1),
prior = normal(0, .5))
print(prior.stan.cgd2, digits = 3)
## stan_surv
## baseline hazard: exponential
## formula: Surv(tstart, tstop, infect) ~ treat + inherit + steroids
## observations: 203
## events: 76 (37.4%)
## right censored: 127 (62.6%)
## delayed entry: yes
## ------
## Median MAD_SD exp(Median)
## (Intercept) -6.219 1.653 NA
## treat 0.004 0.501 1.004
## inherit -0.002 0.492 0.998
## steroids 0.004 0.518 1.004
##
## ------
## * For help interpreting the printed output see ?print.stanreg
## * For info on the priors used see ?prior_summary.stanreg
#Compare them
mcmc_intervals(prior.stan.cgd)
mcmc_intervals(prior.stan.cgd2)
# Null
fit.stan.cgd.exp.f.null <- update(prior.stan.cgd2,
prior_PD = FALSE,
formula=f.null,
basehaz = "exp")
# cubic m-spline tstart2, tstop2
fit.stan.cgd.ms10.f.null <- update(fit.stan.cgd.exp.f.null,
basehaz = "ms",
basehaz_ops = list(df = 10))
# Full
fit.stan.cgd.exp.f.full <- update(prior.stan.cgd2,
prior_PD = FALSE,
formula=f.full,
basehaz = "exp")
# cubic m-spline tstart2, tstop2
fit.stan.cgd.ms10.f.full <- update(fit.stan.cgd.exp.f.full,
basehaz = "ms",
basehaz_ops = list(df = 10))
fits_stan <- list("exp.f.null" = fit.stan.cgd.exp.f.null,
"exp.f.full" = fit.stan.cgd.exp.f.full,
"ms10.f.null" = fit.stan.cgd.ms10.f.null,
"ms10.f.full" = fit.stan.cgd.ms10.f.full
)
# fit.stan.as.ms10<-fit.stan.ms10
print(fit.stan.cgd.exp.f.full, digits = 3)
## stan_surv
## baseline hazard: exponential
## formula: Surv(tstart, tstop, infect) ~ treat + inherit + steroids
## observations: 203
## events: 76 (37.4%)
## right censored: 127 (62.6%)
## delayed entry: yes
## ------
## Median MAD_SD exp(Median)
## (Intercept) -5.527 0.815 NA
## treat -0.825 0.225 0.438
## inherit 0.164 0.214 1.178
## steroids -0.296 0.384 0.744
##
## ------
## * For help interpreting the printed output see ?print.stanreg
## * For info on the priors used see ?prior_summary.stanreg
print(fit.stan.cgd.ms10.f.full, digits = 3)
## stan_surv
## baseline hazard: M-splines on hazard scale
## formula: Surv(tstart, tstop, infect) ~ treat + inherit + steroids
## observations: 203
## events: 76 (37.4%)
## right censored: 127 (62.6%)
## delayed entry: yes
## ------
## Median MAD_SD exp(Median)
## (Intercept) 0.682 0.868 NA
## treat -0.874 0.232 0.417
## inherit 0.169 0.217 1.185
## steroids -0.318 0.403 0.728
## m-splines-coef1 0.051 0.026 NA
## m-splines-coef2 0.032 0.029 NA
## m-splines-coef3 0.035 0.029 NA
## m-splines-coef4 0.086 0.044 NA
## m-splines-coef5 0.077 0.046 NA
## m-splines-coef6 0.068 0.047 NA
## m-splines-coef7 0.262 0.079 NA
## m-splines-coef8 0.132 0.100 NA
## m-splines-coef9 0.130 0.103 NA
## m-splines-coef10 0.052 0.052 NA
##
## ------
## * For help interpreting the printed output see ?print.stanreg
## * For info on the priors used see ?prior_summary.stanreg
m.cox.full%>%
tbl_regression(exponentiate = T)
| Characteristic | HR1 | 95% CI1 | p-value |
|---|---|---|---|
| treat | 0.34 | 0.19, 0.63 | <0.001 |
| inherit | 1.19 | 0.64, 2.23 | 0.6 |
| steroids | 0.46 | 0.18, 1.16 | 0.10 |
| 1 HR = Hazard Ratio, CI = Confidence Interval | |||
mcmc_post_ci(fit.stan.cgd.exp.f.full,.95,4)
## HR
## treat 0.44 (0.28 to 0.67)
## inherit 1.18 (0.78 to 1.79)
## steroids 0.74 (0.36 to 1.63)
mcmc_post_ci(fit.stan.cgd.ms10.f.full,.95,4)
## HR
## treat 0.42 (0.26 to 0.65)
## inherit 1.18 (0.76 to 1.81)
## steroids 0.73 (0.34 to 1.68)
plots <- map(fits_stan,plot)
a<-plots[[2]]+
labs(title = "Constant (exp)")+
coord_cartesian(ylim = c(0,.1))+
theme(plot.title = element_text(hjust = .5))
b<-plots[[4]]+labs(title = "M-splines (df=10)")+
coord_cartesian(ylim = c(0,.1))+
theme(plot.title = element_text(hjust = .5))
a+b
data_test <- data.frame(
id = 1:2,
treat = c(0, 1),
inherit = c(1, 1),
steroids = c(1, 1)
)
ndraws=1000
##### Constant (exponential)
psb<-posterior_survfit(fit.stan.cgd.exp.f.full,
newdata = data_test,
times = 0,
extrapolate = T,
condition = FALSE,
return_matrix = F,
control = list(edist = 439),
draws = ndraws)
psb<-psb %>%
left_join(data_test,by="id")
#tidybayes does not work with posterior_survfit yet
b<-psb %>% as_tibble()%>%
ggplot(aes(x=time,y=median,col=factor(treat),fill=factor(treat))) +
# scale_x_continuous(breaks=c(50,100,150,200,250,300,350,400,439))+
geom_ribbon(aes(ymin = ci_lb, ymax = ci_ub),size=0.1,alpha=0.1) +
geom_line()+labs(x="Time (days)",y="",subtitle="Bayesian constant",col="Treatment",fill="Treatment")+
theme_minimal()+theme(legend.position = "none")
# +add_knots(fit.stan.ms10)
####M-spline
psc<-posterior_survfit(fit.stan.cgd.ms10.f.full,
newdata = data_test,
times = 0,
extrapolate = T,
condition = FALSE,
return_matrix = F,
control = list(edist = 439),
draws = ndraws)
psc<-psc %>%
left_join(data_test,by="id")
#tidybayes does not work with posterior_survfit yet
c<-psc %>% as_tibble()%>%
ggplot(aes(x=time,y=median,col=factor(treat),fill=factor(treat))) +
# scale_x_continuous(breaks=c(50,100,150,200,250,300,350,400,439))+
geom_ribbon(aes(ymin = ci_lb, ymax = ci_ub),size=0.1,alpha=0.1) +
geom_line()+labs(x="Time (days)",y="",subtitle="Bayesian M-spline",col="Treatment",fill="Treatment")+
theme_minimal()+theme(legend.position = "none")
# +add_knots(fit.stan.ms10)
#Lets compare with cox
ps2<-survfit(m.cox.full, newdata = data_test)
ps2cox<-data.frame(time=rep(ps2$time,2),
median=c(ps2$surv[,1],ps2$surv[,2]),
ci_lb=c(ps2$lower[,1],ps2$lower[,2]),
ci_ub=c(ps2$upper[,1],ps2$upper[,2]),
treat = rep(c(0, 1),each=length(ps2$time)),
inherit = 1,
steroids = 1
)
a<-ps2cox %>%
ggplot(aes(x=time,y=median,col=factor(treat),fill=factor(treat))) +
geom_ribbon(aes(ymin = ci_lb, ymax = ci_ub),size=0.1,alpha=0.1) +
geom_line()+labs(x="Time (days)",y="Probability of Survival Free\n of Infection",subtitle="COX PH",col="Treatment",fill="Treatment")+
theme_minimal()+theme(legend.position = "bottom")
legend = get_legend(a)
a<-a+theme(legend.position = "none")
(a+b+c)/(plot_spacer()+legend+plot_spacer())
For publishing purposes, survival plots require additional tweaking in ggplot. I have taken inspiration from survminer package that makes a wonderful paper-like plots for cox ph models. Let’s do the same but for our rstanarm model:
annoTextS=4
cbPalette <- c("#4DBBD5","#E64B35")
grid <- seq(0,439,by=100)
data_test <- data.frame(
id = 1:2,
treat = c(0, 1),
inherit = c(1, 1),
steroids = c(1, 1)
) %>%
mutate(Strata =ifelse(treat==0, "Untreated", "Treated"))
ndraws=1000
# already collapsed
ps<-posterior_survfit(fit.stan.cgd.ms10.f.full,
newdata = data_test,
times = 0,
extrapolate = T,
condition = FALSE,
return_matrix = F,
control = list(edist = 439),
draws = ndraws)
ps<-ps %>%
left_join(data_test,by="id")
# prepare HR annotations
text.df<-data.frame(
x=c(0),
y=c(0.05),
label=c("HR Treated=0.44 (0.28 to 0.67)")
)
################ survival curves
a<-ps %>%as_tibble()%>%
ggplot(aes(x=time,y=median,col=Strata)) +
geom_ribbon(aes(ymin = ci_lb, ymax = ci_ub,fill=Strata),
# fill = "gray90",
alpha=0.2,
size=0.0) +
geom_line()+
scale_color_manual(values=cbPalette)+
scale_fill_manual(values = cbPalette)+
scale_x_continuous(breaks = grid)+
labs(x="",y="Probability of Survival Free\n of Infection",col="Strata")+
survminer::theme_survminer(base_family = "Times New Roman")
a<-a+annotate(geom="text",x=text.df$x,y=text.df$y,label=text.df$label,
size=annoTextS,
hjust=0,family="Times New Roman")+
theme(legend.position = "right",
text=element_text(family="Times New Roman"),
plot.margin = unit(c(0,0,0,0), "cm"))
#obtain legend object
legend = get_legend(a)
# a<-a+theme(legend.position = "none")
################ Risk table as ggplot element
datatr <- fit.stan.cgd.ms10.f.full$data %>% ungroup() %>%
mutate(Strata = factor(
ifelse(treat==0, "Untreated", "Treated")
))
summary(datatr$Strata)
## Treated Untreated
## 83 120
patients<-datatr %>%
group_by(id) %>%
arrange(tstart) %>%
slice_head()
riskcounts.df<-rbind(
RiskSetCount(grid,patients,strataoi ="Untreated"),
RiskSetCount(grid,patients,strataoi ="Treated")
)
tabrisk<-ggplot(riskcounts.df, aes(x = time,y = factor(strata),
label = as.character(value)
)) +
geom_text(size = 4,family = "Times New Roman")+
coord_cartesian(xlim = c(0,439))+
scale_x_continuous(breaks=grid)+
scale_y_discrete(limits=rev(c(
"Untreated",
"Treated"
)),labels=c("",""))+
labs(x="Time (months)",y="Strata",subtitle = "Number at risk")+
survminer::theme_survminer(base_family = "Times New Roman")+
theme(legend.position = "none",
text = element_text(family = "Times New Roman"),
axis.text.y = element_text( hjust = 1 ),
axis.ticks.y = element_line(size = 2,colour = rev(cbPalette)),
axis.ticks.length.y = unit(15, "pt"),
plot.margin = unit(c(0,0,0,0), "cm"))
(a<-a / tabrisk+plot_layout(ncol=1,heights = c(3,1)))
Complementary to HR, we quantify time to recurrence (TTR) using the difference in the Restricted Mean Survival Time (RMST). Clinically, RMST is defined as the average event-free survival time among a population up to a fixed clinically important follow-up time \((\tau)\). [Cite rmst cardio, Royston] It is estimated using the area under the curve between start of the follow-up \((t=0)\) and a particular time horizon (\(t=\tau\)). The RMST is denoted by \(\rho(\tau)\) and approximated using a \(15\)-points Gauss-Kronrod quadrature as:
\[ \begin{aligned} \rho(\tau)\approx\frac{\tau}{2}\sum_{i=1}^{15}w_{i}S(\frac{\tau}{2}+\frac{\tau}{2}\chi_i)\\ \end{aligned} \]
tau <- c(180, 365)
rmst.trt <-
map(tau,
~ rmst_check_plot(
fit.stan.cgd.ms10.f.full,
data_test,
tau = .
))
#number of digits for the table
ndig=1
rmst.table={}
for(i in 1:length(tau)) {
treated=paste0(
round(median(rmst.trt[[i]][[1]]$rmstA),ndig),
" (",
round(quantile(rmst.trt[[i]][[1]]$rmstA,prob=c(0.025)),ndig),
" to ",
round(quantile(rmst.trt[[i]][[1]]$rmstA,prob=c(0.975)),ndig),
")"
)
notreated=paste0(
round(median(rmst.trt[[i]][[1]]$rmstB),ndig),
" (",
round(quantile(rmst.trt[[i]][[1]]$rmstB,prob=c(0.025)),ndig),
" to ",
round(quantile(rmst.trt[[i]][[1]]$rmstB,prob=c(0.975)),ndig),
")"
)
diff=paste0(
round(median(rmst.trt[[i]][[1]]$diffA_B),ndig),
" (",
round(quantile(rmst.trt[[i]][[1]]$diffA_B,prob=c(0.025)),ndig),
" to ",
round(quantile(rmst.trt[[i]][[1]]$diffA_B,prob=c(0.975)),ndig),
")"
)
obs=data.frame(tau=tau[i],
RMST.A=treated,
RMST.B=notreated,
RMST.diff=diff
)
rmst.table<-rbind(rmst.table,obs)
}
rmst.table%>%
kableExtra::kbl() %>%
kableExtra::kable_paper("hover",full_width=F)
| tau | RMST.A | RMST.B | RMST.diff |
|---|---|---|---|
| 180 | 141.8 (105.8 to 163.5) | 162.8 (141.2 to 173.3) | -20.3 (-41.8 to -7.5) |
| 365 | 222.7 (140.8 to 292.3) | 292.8 (225.6 to 333.8) | -67.4 (-112.8 to -29.3) |
# join all 3 measures
rmst.trt.gg<-rbind(
rmst.trt[[1]][[1]],
rmst.trt[[2]][[1]]
)
rmst.trt.gg$tau<-factor(as.character(rmst.trt.gg$tau),levels=c("180","365"))
# wide to long for easy manipulation
rmst.trt.gg<-gather(rmst.trt.gg,condition,time,rmstA:ratioA_B)
a<-ggplot() +
geom_point(
data = rmst.trt.gg %>% filter(condition %in% c("rmstA", "rmstB")),
aes(x = tau, y = time, group = condition,col=condition),
position = "jitter",
alpha = 0.05,
size = 0.01
) +
scale_color_manual(values = c("red", "blue"),labels=c("No","Yes"))+
geom_boxplot()+
stat_summary(
fun=mean,
geom="line"
)+
labs(y = "Time-free of infection",col="Treatment") +
guides(colour = guide_legend(override.aes = list(size=10)))+
theme_bw()
a
bayesplot_grid(
a,
rmst.trt[[1]]$p3,
rmst.trt[[2]]$p3,
# rmst.ar[[3]]$p3,
grid_args = list(ncol = 2),
# titles = paste0("RMST (tau=", tau, ")"),
# subtitles = rep("with medians and 95% CI", 4)
subtitles = c("Time-free evolution","Tau=180","Tau=365")
)
On average, treated patients had a median of 20 and 67 additional days free of infection compared to non-treated patients, at 6 months and 1-year of follow-up.
To assess the fit of the regression models, we performed model comparison (M-Spline vs baseline exponential hazard) at different levels.
At model level (MCMC mixing).
At hazard ratios level comparing estimates VS posteriors.
At prediction level (LOGO, WAIC and C-Index.
Are chains mixing well?
color_scheme_set("mix-blue-red")
mcmc_trace(fit.stan.cgd.ms10.f.full,
pars=c("treat", "(Intercept)"),
facet_args = list(ncol = 1, strip.position = "left")
)
How far is our distribution compared to the cox’s estimate? Comparing the three models, we can see M-spline-based is closer to the cox estimate.
# extract HR from classical coxph for arm=B
exp(coef(m.cox.full))[1]
## treat
## 0.3422474
base_cox_hr <- vline_at(exp(coef(m.cox.full))[1], color = "green")
a<-mcmc_hist(prior.stan.cgd2,
pars = c("treat"),
transformations = exp,
binwidth = 0.001) + base_cox_hr+labs(subtitle="Priors")
b<-mcmc_hist(fit.stan.cgd.exp.f.full,
pars = c("treat"),
transformations = exp,
binwidth = 0.001) + base_cox_hr+labs(subtitle = "Posterior (const)")
c<-mcmc_hist(fit.stan.cgd.ms10.f.full,
pars = c("treat"),
transformations = exp,
binwidth = 0.001) + base_cox_hr+labs(subtitle = "Posterior (ms-10)")
a+b+c
Fit is examined by using a leave-one-out cross validation based on the expected log predictive density (elpd). According to Gelman et al., the lower the elpd score the better the model fit is. A leave-one-out is used to avoid over optimistic due to overfitting. We assume “leaving-out” an individual rather than an observation for both goodness of fit and calibration schemes.
post <- as.array(fit.stan.cgd.ms10.f.full)
ids<-fit.stan.cgd.exp.f.full$data$id
chain_id=rep(1:dim(post)[2], each = dim(post)[1])
#### model exp null
#1. Get log likehood per infection event
myllk.exp.f.null<-log_lik(fit.stan.cgd.exp.f.null,merge_chains = F)
#2. Join llk by patient
myllk2.exp.f.null<-llkByPatient(llk = myllk.exp.f.null,ids = ids)
## [1] 10000 203
## [1] 128 10000
# 3. Effective samples
reff.exp.f.null<-relative_eff(myllk2.exp.f.null,chain_id = chain_id,cores = 10)
#### model exp full
#1. Get log likehood per infection event
myllk.exp.f.full<-log_lik(fit.stan.cgd.exp.f.full,merge_chains = F)
#2. Join llk by patient
myllk2.exp.f.full<-llkByPatient(llk = myllk.exp.f.full,ids = ids)
## [1] 10000 203
## [1] 128 10000
# 3. Effective samples
reff.exp.f.full<-relative_eff(myllk2.exp.f.full,chain_id = chain_id,cores = 10)
#model ms10 null
myllk.ms10.f.null<-log_lik(fit.stan.cgd.ms10.f.null)
myllk2.ms10.f.null<-llkByPatient(llk = myllk.ms10.f.null,ids = ids)
## [1] 10000 203
## [1] 128 10000
reff.ms10.f.null<-relative_eff(myllk2.ms10.f.null,chain_id = chain_id,cores = 10)
#model ms10 full
myllk.ms10.f.full<-log_lik(fit.stan.cgd.ms10.f.full)
myllk2.ms10.f.full<-llkByPatient(llk = myllk.ms10.f.full,ids = ids)
## [1] 10000 203
## [1] 128 10000
reff.ms10.f.full<-relative_eff(myllk2.ms10.f.full,chain_id = chain_id,cores = 10)
# Versus frequentist approaches
AIC(m.cox.null,m.cox.full)
## df AIC
## m.cox.null 0 684.2894
## m.cox.full 3 667.7961
#leave-one-out ELPD
compare(
loo(myllk2.exp.f.null, r_eff = reff.exp.f.null),
loo(myllk2.exp.f.full, r_eff = reff.exp.f.full),
loo(myllk2.ms10.f.null, r_eff = reff.ms10.f.null),
loo(myllk2.ms10.f.full, r_eff = reff.ms10.f.full)
)%>%
kableExtra::kbl() %>%
kableExtra::kable_paper("hover",full_width=F)
| elpd_diff | se_diff | elpd_loo | se_elpd_loo | p_loo | se_p_loo | looic | se_looic | |
|---|---|---|---|---|---|---|---|---|
| loo(myllk2.ms10.f.full, r_eff = reff.ms10.f.full) | 0.000000 | 0.000000 | -538.4634 | 76.34679 | 10.661322 | 2.1729935 | 1076.927 | 152.6936 |
| loo(myllk2.exp.f.full, r_eff = reff.exp.f.full) | -3.248304 | 3.663320 | -541.7117 | 76.20418 | 5.078211 | 1.3992236 | 1083.423 | 152.4084 |
| loo(myllk2.ms10.f.null, r_eff = reff.ms10.f.null) | -7.332385 | 4.526938 | -545.7958 | 77.89670 | 7.215401 | 1.3588574 | 1091.592 | 155.7934 |
| loo(myllk2.exp.f.null, r_eff = reff.exp.f.null) | -10.350965 | 6.078644 | -548.8144 | 77.78945 | 2.013492 | 0.6077481 | 1097.629 | 155.5789 |
#leave-one-out WAIC
compare(
waic(myllk2.exp.f.null, r_eff = reff.exp.f.null),
waic(myllk2.exp.f.full, r_eff = reff.exp.f.full),
waic(myllk2.ms10.f.null, r_eff = reff.ms10.f.null),
waic(myllk2.ms10.f.full, r_eff = reff.ms10.f.full)
)%>%
kableExtra::kbl() %>%
kableExtra::kable_paper("hover",full_width=F)
| elpd_diff | se_diff | elpd_waic | se_elpd_waic | p_waic | se_p_waic | waic | se_waic | |
|---|---|---|---|---|---|---|---|---|
| waic(myllk2.ms10.f.full, r_eff = reff.ms10.f.full) | 0.000000 | 0.000000 | -538.3963 | 76.32857 | 10.594244 | 2.1479533 | 1076.793 | 152.6571 |
| waic(myllk2.exp.f.full, r_eff = reff.exp.f.full) | -3.277120 | 3.663874 | -541.6734 | 76.19422 | 5.039949 | 1.3821537 | 1083.347 | 152.3884 |
| waic(myllk2.ms10.f.null, r_eff = reff.ms10.f.null) | -7.375539 | 4.535321 | -545.7719 | 77.89051 | 7.191478 | 1.3520872 | 1091.544 | 155.7810 |
| waic(myllk2.exp.f.null, r_eff = reff.exp.f.null) | -10.416951 | 6.084981 | -548.8133 | 77.78919 | 2.012400 | 0.6074468 | 1097.627 | 155.5784 |
In both, null and full scenarios, elpd is better in M-splines alternatives, of course full model is better than null as expected.
Let’s focus only on full models for the concordance metric.
ndraws=500
data_test<-fit.stan.cgd.exp.f.full$data
data_test$coxlp.f.full<-predict(m.cox.full,newdata = data_test,"lp")
data_test$coxsurv.f.full<-predict(m.cox.full,newdata = data_test,"survival")
#loghaz refers to lp in bayes
data_test$explp.f.full<-posterior_survfit(fit.stan.cgd.exp.f.full,
newdata = data_test,
extrapolate = F,
type="loghaz",
draws = ndraws,return_matrix = F,
times = "tstart",
last_time = "tstop")$median
data_test$expsurv.f.full<-posterior_survfit(fit.stan.cgd.exp.f.full,
newdata = data_test,
extrapolate = F,
type="surv",
draws = ndraws,return_matrix = F,
times = "tstart",
last_time = "tstop")$median
#M-splines
data_test$ms10lp.f.full<-posterior_survfit(fit.stan.cgd.ms10.f.full,
newdata = data_test,
extrapolate = F,
type="loghaz",
draws = ndraws,return_matrix = F,
times = "tstart",
last_time = "tstop")$median
data_test$ms10surv.f.full<-posterior_survfit(fit.stan.cgd.ms10.f.full,
newdata = data_test,
extrapolate = F,
type="surv",
draws = ndraws,return_matrix = F,
times = "tstart",
last_time = "tstop")$median
# Pairs
pairs(~coxlp.f.full+explp.f.full+ms10lp.f.full, data_test,
upper.panel = panel.cor, # Correlation panel
lower.panel = panel.smooth)
pairs(~coxsurv.f.full+expsurv.f.full+ms10surv.f.full, data_test,
upper.panel = panel.cor, # Correlation panel
lower.panel = panel.smooth)
Graphically, we observe some correlation between cox and bayesian predictions. Let’s estimate C-index for our predictions
y_test <- Surv(data_test$tstart,
data_test$tstop,
data_test$infect)
# cindex for linear predictor (log hazard)
concordance(y_test~data_test$coxlp.f.full,reverse = T) #it works with risk
## Call:
## concordance.formula(object = y_test ~ data_test$coxlp.f.full,
## reverse = T)
##
## n= 203
## Concordance= 0.652 se= 0.03269
## concordant discordant tied.x tied.y tied.xy
## 4066 1758 1767 6 0
concordance(y_test~data_test$explp.f.full,data = data_test,reverse = T)
## Call:
## concordance.formula(object = y_test ~ data_test$explp.f.full,
## data = data_test, reverse = T)
##
## n= 203
## Concordance= 0.652 se= 0.03269
## concordant discordant tied.x tied.y tied.xy
## 4066 1758 1767 6 0
concordance(y_test~data_test$ms10lp.f.full,data = data_test,reverse = T)
## Call:
## concordance.formula(object = y_test ~ data_test$ms10lp.f.full,
## data = data_test, reverse = T)
##
## n= 203
## Concordance= 0.5672 se= 0.03543
## concordant discordant tied.x tied.y tied.xy
## 3818 2798 975 6 0
#fixed time
times = as.double(seq(5, 439, 100))
summary(data_test$tstart)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 0.0 0.0 69.5 121.0 373.0
#most common time slots
times2<-data_test %>%
dplyr::filter(tstart>0.0) %>%
mutate(ints = cut(tstart ,
breaks = seq(0, 439, 100),
include.lowest = FALSE,
right = FALSE)) %>%
dplyr::group_by(ints,tstart) %>%
dplyr::summarise(myn=n()) %>%
slice_max(myn, with_ties = F)
times2
## # A tibble: 4 × 3
## # Groups: ints [4]
## ints tstart myn
## <fct> <dbl> <int>
## 1 [0,100) 65 2
## 2 [100,200) 121 2
## 3 [200,300) 265 2
## 4 [300,400) 373 2
times2<-times2%>%
pull(tstart)
times
## [1] 5 105 205 305 405
(times<-times2)
## [1] 65 121 265 373
y_test.f.full <- filter(data_test, tstart %in% times) %>%
ungroup() %>%
select(c("tstart","tstop","infect"))
library(data.table)
res<-calibrate(data = data_test,times = times,y = y_test.f.full,
tstart_col = "tstart",tstop_col ="tstop",status_col = "infect",
n_groups = 10,surv_col = "coxsurv.f.full" )
autoplot(res)+labs(subtitle = "Cox PH")
res<-calibrate(data = data_test,times = times,y = y_test.f.full,
tstart_col = "tstart",tstop_col ="tstop",status_col = "infect",
n_groups = 10,surv_col = "expsurv.f.full" )
autoplot(res)+labs(subtitle = "Constant (exp)")
res<-calibrate(data = data_test,times = times,y = y_test.f.full,
tstart_col = "tstart",tstop_col ="tstop",status_col = "infect",
n_groups = 10,surv_col = "ms10surv.f.full" )
autoplot(res)+labs(subtitle = "M-spline")
We obtain mixed results in the validation part for all approaches, reasons are manifold. Survival analysis is hard, specially in very tricky datasets (recurrent visits). I suggest you select model based on a combination of more than one validation technique (HR + elpd, HR + C-Index). Also, I suggest evidence-driven covariate-selection with additional caution on non-normal covariates.
Therneau, T., Crowson, C., and Atkinson E. “Using Time Dependent Covariates and Time Dependent Coefficients in the Cox Model”. Survival Vignettes. Accessed May 1, 2023.
Therneau T (2023). A Package for Survival Analysis in R. R package version 3.5-5, https://CRAN.R-project.org/package=survival.
Ramsay, J. O. 1988. “Monotone Regression Splines in Action.” Statistical Science 3 (4): 425–41. https: //doi.org/10.1214/ss/1177012761. Accessed July 1, 2022.
Royston, Patrick, and Mahesh KB Parmar. 2013. “Restricted Mean Survival Time: An Alternative to the Hazard Ratio for the Design and Analysis of Randomized Trials with a Time-to-Event Outcome.” BMC Medical Research Methodology 13 (1): 152.
Brilleman SL, Elçi EM, Novik JB, Wolfe R. Bayesian Survival Analysis Using the rstanarm R Package. 2020. arXiv preprint. arXiv:2002.09633. URL: [https://arxiv.org/abs/2002.09633.](https://arxiv.org/abs/2002.09633) Accessed July 1, 2022.